Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port range function and change gen_series logic #9352

Merged
merged 8 commits into from
Feb 29, 2024

Conversation

Lordworms
Copy link
Contributor

@Lordworms Lordworms commented Feb 27, 2024

Which issue does this PR close?

Closes #9323
Closes #9351

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Feb 27, 2024
let mut values = vec![];
let mut offsets = vec![0];
for (idx, stop) in stop_array.iter().enumerate() {
let stop = stop.unwrap_or(0) + include_upper;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_series(i64::MAX, i64::MAX) will panic.

DataFusion CLI v36.0.0
❯ select generate_series(9223372036854775807, 9223372036854775807);
thread 'main' panicked at datafusion/functions-array/src/kernels.rs:296:20:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It can succeed in PostgreSQL and DuckDB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have i128Array yet, so probably this panic is unavoidable until we support it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the result is incorrect when the step is a negative number.

DataFusion CLI v36.0.0
❯ select generate_series(5,1,-1);
+----------------------------------------------+
| generate_series(Int64(5),Int64(1),Int64(-1)) |
+----------------------------------------------+
| [5, 4, 3]                                    |
+----------------------------------------------+
1 row in set. Query took 0.005 seconds.

In DuckDB:

D select generate_series(5,1,-1);
┌───────────────────────────┐
│ generate_series(5, 1, -1) │
│          int64[]          │
├───────────────────────────┤
│ [5, 4, 3, 2, 1]           │
└───────────────────────────┘

Copy link
Member

@jonahgao jonahgao Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have i128Array yet, so probably this panic is unavoidable until we support it.

I tried it in the following way, and then it worked, but I haven't checked it carefully yet.

    for (idx, stop) in stop_array.iter().enumerate() {
        let stop = stop.unwrap_or(0);
        let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
        let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
        if step == 0 {
            return exec_err!("step can't be 0 for function range(start [, stop, step]");
        }
        if step < 0 {
            // Decreasing range
            values.extend((stop + 1..start + 1).rev().step_by((-step) as usize));
        } else {
            // Increasing range
            values.extend((start..stop).step_by(step as usize));
        }
        // TODO: include_upper should be a boolean flag
        if include_upper > 0 {
            match values.last() {
                Some(&last) if last + step == stop => {
                    values.push(stop);
                }
                None => {
                    values.push(stop);
                }
                _ => {}
            }
        }
        offsets.push(values.len() as i32);
    }

UPDATE:
Still panic on the following queries:

  • select generate_series(9223372036854775807, 9223372036854775807, -1)
  • select generate_series(-9223372036854775807, -9223372036854775808, -2)
    Additional checks might be needed regarding negative step values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the edge cases for future handling is okay with me. However, the behavior for negative step is incorrect and needs to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it

# under the License.

query ?
SELECT range(5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already test in array.slt, If you want to move to a new file, don't forget them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving these tests into array.slt? Their number is not large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge them in array.slt

/// gen_range(3) => [0, 1, 2]
/// gen_range(1, 4) => [1, 2, 3]
/// gen_range(1, 7, 2) => [1, 3, 5]
pub fn gen_range(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont forget to delete the code in array_expression.rs

make_udf_function!(
Range,
range,
input diamilter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be

Suggested change
input diamilter,
start stop step,

And it will be expanded to

 pub fn range(start: Expr, stop: Expr, step: Expr) -> Expr {
        Expr::ScalarFunction(
            ScalarFunction::new_udf(
                range_udf(),
                <[_]>::into_vec(
                    #[rustc_box]
                    ::alloc::boxed::Box::new([start, stop, step]),
                ),
            ),
        )
    }

cargo expand can help to check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

make_udf_function!(
GenSeries,
gen_series,
input diamilter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input diamilter,
start stop step,

Similar to above.

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
We might also need to update the documentation.

@Lordworms
Copy link
Contributor Author

Looks good to me! We might also need to update the documentation.

sure, I would do it right now

@jonahgao
Copy link
Member

I've added a few suggestions about the document, the rest is okay for me 👍

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jayzhan211
Copy link
Contributor

Thanks @Lordworms and @jonahgao !

@jayzhan211 jayzhan211 merged commit ca37ce3 into apache:main Feb 29, 2024
24 checks passed
@@ -2906,7 +2906,28 @@ empty(array)

### `generate_series`

_Alias of [range](#range)._
Similar to the range function, but it includes the upper bound.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
4 participants